CSU-Li-MC2

VAST Challenge 2015
Mini-Challenge 2

 

 

Team Members:

Juncai Li, Central South University, dreair@csu.edu.cn

Quan Wang, Central South University, 997985134@qq.com

Pin Luo, Central South University, 122405379@qq.com

Yuan Zeng, Central South University, 844755848@qq.com

Ying Zhao, Central South University, zhaoying511@gmail.com    SUPERVISOR

Fangfang Zhou, Central South University, zhouffang@gmail.com    SUPERVISOR

Student Team:  YES

 

Did you use data from both mini-challenges? YES

 

Analytic Tools Used:

Python

Processing

MySQL

Tableau

Excel

 

Approximately how many hours were spent working on this submission in total?

About 150 hours ( 50 days, and 3 hours/day )

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES

 

Video:

index.files\CSU-Li-MC2-VideoDemo.wmv

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

At the beginning of our answer sheet, Table.1 gives an overview of our findings.

 

Table.1 An overview of our answer sheet.

 

MC2.1 – Identify those IDs that stand out for their large volumes of communication.  For each of these IDs

      a.   Characterize the communication patterns you see.

      b.   Based on these patterns, what do you hypothesize about these IDs?

(Limit your response to no more than 4 images and 300 words.)

 

Three personIDs are significant in communications: external, 839736 and 1278894.  

Their common patterns are extreme volumes and degrees of communication (Fig.1-1), and their non-existence in the movement data.

(Notes: in our answer sheet, Fig.1-1 is equal to Fig.1-)

 

 

Fig1-SpecialPersonOverview

Fig.1 The overall analysis of communication volumes and degrees of persons.

 

The following is a detailed analysis of them.

 

(1)   P1-1: About personID-external

The “external” has extreme volume and degree of communication (Fig.2-1).

The “external” is non-existence in the movement data (Fig.2-2).

The “external” never send or reply messages to anyone (Fig.2-3).

The rush hour of communication related to “external” occurred at noon on Sunday (Fig.2-4).

As mentioned in the introduction of VAST Challenge 2015, “visitors can send texts to someone outside of the park”.

Therefore, we think this personID can be anyone at an external party, such as campground and hotels near the park. We also guess the park-app can be used to connect with some social networking sites, such as Facebook.com and Twitter.com.

 

Fig2-External

Fig.2 An overview of the TMViz tool and the information about personID-external.

 

(2)   P1-2: About personID-839736

The communications between 839736 and any visitor always run back and forth. This is a common feature of 839736 and 1278894, but 1278894 sent texts to visitors first and then received visitors’ replies. Fig.3-1 shows 9924 sent three texts to 839736 and received three replies of 839736 timely. Shown as Fig.3-2, 1278894 sent 12 texts to 9924, and 9924 replied 12 times.

The volume of communication related to 839736 was very high on Sunday afternoon, especially at 12:00 and 14:45 (Fig.4-1). Because of the occurrence of the crime, some attractions were closed on Sunday afternoon (the detailed analysis of the crime is in MC2.3). We speculate the large volume comes from visitors questioning why the attractions were out of services.

To sum up, we think 839736 is a park service ID which solves questions and difficulties of visitors.

 

Fig3-839736

Fig.3 An overview of the DNViz tool and the communication patterns between visitors and park-service IDs.

 

(3)   P1-3: About personID-1278894

Periodicity is the most important feature of 1278894. We analyze this feature from three parts.

Part 1: The communications related to 1278894 appeared in only five hours a day: 12:00, 14:00, 16:00, 18:00 and 20:00 (Fig.4-2).

Part 2: The communications between 1278894 and visitors also appeared in these five hours (Fig.4-3).

Part 3: At different attractions, the rush hours of communication related to 1278894 are different. Fig.4-4 and Fig.4-5 are the examples.

To sum up, ID-1278894 is also a park service ID which provides periodical information service. It may be related to some interactive games between the park services and tourists. The different rush hours of attractions may be related to the pre-settings of the games.

 

Fig4-1278894

Fig.4 The temporal patterns of communications related to 839736 and 1278894.

 

Additionally, it’s noteworthy that lots of persons have large volumes and degrees of communication (Fig.1-2). We guess they might be guides of big tour groups, be managers of local areas or be promoters of entertainment projects

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

MC2.2 – Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where.

If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.

(Limit your response to no more than 10 images and 1000 words.)

 

 

(1) P2-1: The overall temporal patterns of movement and communication.

 

Fig.5 reveals some overall temporal patterns of movement and communication.

1. The opening hours of the park are from 8:00AM to 23:00PM.

2. The number of persons played in the park: Sunday > Saturday > Friday.

3. Two timelines (the number of persons sending texts and the number of persons receiving texts) have obvious periodical change characteristics (Fig.5-1). The reason has been explained in P1-3.

4. The peaks of communication on noon Sunday will be analyzed in MC2.3.

5. The park's rush hours may be 11:00AM and 16:00PM because of the impulses in the timelines of communication and check-in. Here, we get deeper insights into the peak on Saturday 16:00. In Fig.5-3, the scatter of persons shows the high volumes of communication was related to external and 1278894 at that time, the map1 shows Atmosfear was the hottest attraction, the map2 shows the hottest communication point was at Grinosaurus Stage because the superstar came on to the stage, and the matrix of communities shows some very active communities.

Additionally, the movement data for Friday night after 8:00 pm may be missing.

 

Fig.5 AllComm

Fig.5 The overall temporal patterns of movement and communication.

 

(2) P2-2: The overall characteristics of attraction types.

 

There are many types of attractions in the park, such as thrill rides, kiddie rides and shows.

We find some interesting characteristics of communication of attraction types:

1. The volume of communication of thrill rides is much higher than other types of attractions. That means everyone likes life full of adventure. (Fig.6-1)

2. The second favorite types turn out to be shopping and food attractions.

3. The third-tier attractions are shows and rides for everyone.

4. Different attraction types have various temporal features of communication. For example, the peaks of communication at three park entrances occur in the morning (Fig.6-2), and people prefer to go shopping in the evening (Fig.6-3).

 

Fig.6 Attractiontypes

Fig.6 The analysis of attraction types.

 

 

(3) P2-3: The overall communicating characteristics of communities.

 

We use a community detection algorithm (Fast-unfolding) to gather the persons who often contacted with each other in the communication data.

1. Communities fall into four categories according to their sizes (Fig.7-1).

2. There are no communities with 16-29 persons. Therefore, we think this is the watershed for self-guided tours and package tours.

3. Most of the stand-size communities often played in the park only one day, and have few guides or leaders as communication centers. For example, Fig.7-2 shows person-1789765 may be the guide of a community with 35 persons.

4. The big-size communities often played in the park more than one day, and consisted of multiple smaller communities. An example is shown in Fig.7-3.

Additionally, there are 22% persons (2,530) who didn’t communicate with anyone.

 

Fig.7 CommunityOverall

Fig.7. The overall communicating characteristics of communities

 

 

(4) P2-4: The differences between community and group.

 

We also group the persons with consistent trajectories by using cosine similarity in the movement data.

Here, we try to explain the differences between the results of community detection and group detection.

1. Groups also fall into four categories according to their sizes (Fig.7-1).

2. A standard-size community often has an equivalent group for the consistent trajectories of all members. For example, Fig.8-1 shows all the members of the community in Fig.8-2 kept consistent movement-states on Sunday.

3. Similarly, for most of family-size communities, their members always keep a same playing path in the park.

4. However, lots of team-size communities can be divided into different interest groups with different playing paths in the park. For example, Fig.8-2 shows a community with 6 persons can be divided into two groups, and Fig.8-3 shows the persons in a group contacted more.

5. There are some communities with more than 45 persons, but there is no group with more than 45 persons.

 

Fig.DifferenceCommunityAndGroup

Fig.8 Some examples for explaining the differences between community and group.

 

 

(5) P2-5: The communication patterns at Creighton Pavilion (attraction-32, square x=32, y=33)

 

At Pavilion, there were two periods (9:30-11:30AM and 14:30-16:00PM) without any check-in and communication record (Fig.9-1). As mentioned in the introduction of VAST Challenge 2015, "a show of memorabilia would be displayed in the park's Pavilion", and "Creighton Pavilion was closed and locked up tight before each show". We think the two periods were the closing time for preparing next show.

A series of strange things happened at Pavilion on Sunday noon:

1. A peak of communication occurred around 11:40AM. Most of messages were sent to external. (Fig.9-2)

2. Another peak of communication occurred around 12:00PM (noon). Most of messages were sent to 839736. (Fig.9-3)

3. There was no any check-in record on Sunday afternoon, and Pavilion seemed to be closed. (Fig.9-4)

These clues indicate the crime might happen at Pavilion on Sunday noon.

Please get our detailed analysis about the crime in MC2.3.

 

Fig. Pavilion

Fig.9 The analysis of Creighton Pavilion.

 

 

(6) P2-6: The communication patterns at Grinosaurus Stage (attraction-63, square x=76, y=22)

 

At G-Stage, two peaks of check-in illustrate that there were two shows every day in Scott’s weekend, and the peaks of communication may indicate the superstar, Scott Jones, came onto the stage.

On Sunday afternoon, G-Stage was closed because of the crime, and there were no check-in records (Fig.10-1). However, the peak of movement shows lots of visitors still went to G-Stage for watching a real Scott Jones. These disappointed visitors sent texts to park service ID-839736 for asking why G-Stage was closed. Therefore, Fig.10-2 presents the high volume of communication was related to 839736 and square (x=76, y=22) at 14:40PM.

 

Fig.10 Stage

Fig.10 The analysis of Grinosaurus Stage.

 

 

(7) P2-7: Some visitors only communicated with external and 839736.

 

We find 12 strange visitors:

1. They played in the park only one day. (three persons on Fri : 415491,1081515,1307724, four persons on Sat : 417205,626433,953838,1725365, and five persons on Sun : 1149884,1217381,1601276,1711922,500084) (Fig.11-1)

2. After entering the park, they never checked in. (Fig.11-2)

3. They were not included into any community for only contacting with external and 839736. On Sunday noon, the five persons sent lots of texts to 839736 at Pavilion and Stage. (Fig.11-3)

To sum up, we speculate they may be plainclothes policemen, salesmen or reporters.

 

Fig.11 plainclothepolice

Fig.11 Twelve strange visitors.

 

 

(8) P2-8: The events about sudden changes of communication volume at attractions.

 

At an attraction, sudden changes of communication volume usually indicate something has happened. Here is a set of relevant events.

1. Suddenly break :

1.1 At attraction-12 (Flying TyrAndrienkos), the communication interrupted from 10:15 to 10:55AM on Sunday (Fig.12-1). Because the time was the rush hour of the park and no visitors checked in at that time, we speculate this is an abnormal event. The attraction may meet an equipment trouble.

1.2 A similar issue happened to attraction-20 (Scholtz Express) from 13:20 to 13:50PM on Saturday.

2. Suddenly become larger :

2.1 We find many short-time impulses at the timelines of communication volume of attractions, such as at attraction-1 (Wrightraptor Mountain) around 16:25 on Saturday and at attraction-8 (Atmosfear) around 11:25AM on Sunday.

2.2 At attraction-7 (Firefall), the communication volume on Saturday was obviously larger than Sunday.  We think this is an interesting pattern because more people played in the park on Sunday.

2.3 At attraction-66 (Tyrannosaurus Rest), a restroom near Grinosaurus Stage, a peak of communication occurred On Sunday 19:00. There were no similar peaks at night on Friday and Saturday. Moreover, Grinosaurus Stage was closed on Sunday afternoon because of the unexpected crime. Therefore, this is a notable pattern.

2.4 At attraction-62 (Liggement Fix-Me-Up-Information Assistance), a peak of communication occurred on Friday 14:00 (Fig.12-2). Let’s get deeper insight into this event. The attraction-62 is not a ride. The number of check-in records shows that not many people visited it in other time. We speculate the peak of communication caused by the gathering of members of big-size communities. We did find that a few big-size communities were there at that time, such as a community with 68 persons and a community with 61 persons. Fig.12-3 shows a big-size community with 61 persons sent lots of messages at attraction-12.

 

Fig.12 Fix-up

Fig.12 Some examples of P2-8.

 

 

(9) P2-9: Other events related to external and 839736.

 

According to the analysis of MC2.1, we think the communications related to external and 839736 would suddenly become large when some unexpected things happened in the park, such as some visitors had heatstroke or accidental injuries.

Therefore, we find three notable events.

1. At attraction-2 (Galactosaurus Rage), the communications related to external and 839736 suddenly became large on Friday morning (Fig.13-1). This is a notable pattern because the number of visitors in the park is relatively small on Friday morning.

2. At attraction-5 (Wendisaurus Chase), a short-time impulse of the communications related to external is conspicuous in Fig.13-2.

3. At attraction-3 (Auvilotops Express), the communications related to external and 839736 increased sharply on Sunday afternoon (Fig.13-3). This is an interesting pattern because the number of check-in records didn’t increase sharply.

We find that all of the attractions where the three events happened are thrill rides. That indicates accidents can easily happen at thrill rides.

 

Fig.13 Accidents

Fig.13 Other events related to external and 839736.

 

 

(10) P2-10: The hot points of communication in a community.

 

In a community, we can often find some hot points of communication. Some of hot points are due to some people who maintained close communication in the park, and some of hot points are due to send or receive lots of messages in a short time.

Except for the anomalies related to the crime, we do not find other particularly abnormal hot points of communication in a community. Therefore, we just give a simple example. Fig.14 shows three hot points of communication were in a community with 50 persons.

 

Fig.HotPairCommunication

Fig.14 An example of hot points of communication in a community

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

MC2.3 – From this data, can you hypothesize when the crime was discovered?  Describe your rationale.

(Limit your response to no more than 3 images and 300 words.)

 

In this section, we first explain when the crime was discovered, and then we give some clues related to the crime.

 

(1)   P3-1: Discover the crime at Creighton Pavilion (attraction-32).

 

As mentioned in the background, “Someone vandalized a pavilion exhibiting Jones’s memorabilia and made off with some irreplaceable items”. Therefore, we focus on the communications occurred at Pavilion.

At Pavilion, the peak of communication related to “external” appeared 10 minutes earlier than the peak of communication related to “839736-park service ID” at noon on Sunday (Fig.15-1). This interesting phenomenon shows some abnormal things happened at Pavilion. It also indicates tourists tend to share news with friends first. Just as we know, social media is more responsive to "breaking news" than traditional media since the era of mobile Internet access.

We give an example to further validate this important clue (Fig.15-2): a community of 4 people arrived at Pavilion on Sunday 11:34AM. The person-620119 is a member of this community. After check-in, he/she immediately sent texts to visitors in the same community, and then contacted with“external”. About 10 minutes later, he/she began to contact with “839736”.  Around 12:05PM (noon), he/she left Pavilion.

Above all, we speculate the crime was first discovered by visitors at Pavilion around 11:35 on Sunday.

 

Fig.15-DiscoverCrime

Fig.15. The communication patterns when the crime was discovered.

 

(2)   P3-2: Lock the most likely suspects.

 

We look for suspects in the visitors who had been Pavilion on Sunday. Three special persons capture our attention (personID = 461004, 416790, 1502920). They got into Pavilion at 9:01AM, and stayed there until 11:30AM (Fig.16-1). According to the pattern of P3-1 and the provided information that “Creighton Pavilion was closed and locked up tight before each show”, the three persons behaved suspiciously. We speculate they may lurk in Pavilion and vandalize the exhibition when Pavilion was closed and was preparing for the next show.

Their following behaviors are more interesting (Fig.16-2):

1. When the second show was opening at 11:30AM, two of them checked in Pavilion again, and then three suspects left together at 11:32AM. We speculate the person who took the stolen things didn’t check in again.

2. They seemed to run very fast after leaving Pavilion. At 11:35AM, they checked in attraction-6. We speculate they were transferring stolen things to their accomplices.

3. Ten minutes later, they returned Pavilion again at 11:48AM. We speculate they were re-watching the situation of the crime scene. At 12:08PM (noon), they arrived at attraction-7.

4. To play like normal visitors, they sent lots of messages to “external” and “839736” around 11:40AM, and checked in many attractions after the crime.

 

Fig-CrimeMovement

Fig.16 The movement patterns of the seven suspects.

 

 

(3)   P3-3: Look for the suspects’ accomplices.

 

Four persons had contacted with the three suspects.

From the temporal distributions of communication, the 7 persons maintained contacts on Sunday (Fig.17-1), and the peaks indicates they may close meet around 17:00 (Fig.17-2 and Fig.16-3). We speculate 461004 and 1123214 are their leaders for their high volumes of communication in this community (Fig.17-3).

From their movements, the 7 persons can be divided into 3 groups (G1, G2, and G3 in Fig.16). After entered the park from the main entrance at 8:00AM on Sunday, the 3 groups went different paths. However, their paths show they may meet several times. For example, shortly after the crime, there were two possible meets: the three suspects may meet 1123214 and 1350546 at attraction-6 around 11:35AM, and meet 1000279 and 1187909 at attraction-7 around 12:08PM (noon).

To sum up, we doubt these 4 persons are accomplices.

 

Fig.17 CommunicationsOf7Suspects

Fig.17 The communication patterns among the seven suspects.

 

Additionally, we just simply analyze the movement data by using Tableau Software. Because of a lack of the in-depth analysis of the movement data, we are still curious where the stolen goods were, why the suspects checked in many attractions after the crime, and what other bad things the suspects might do.